Questo è il mio primo tentativo con Apache Sqoop per importare una tabella di SQL Server (6 colonne, 4 record) su Hive. Di seguito è riportato il codice.
sqoop import --connect "jdbc:sqlserver://192.168.10.101:1433;database=Testdb" --username abc--password abc --table "DimEmployee" --create-hive-table --hive-import --hive-table DboDimEmployee
L'esecuzione è andata bene ma si è fermata su questo output
19/01/25 13:19:36 INFO mapreduce.Job: Running job: job_1548438714494_0003
Ho controllato la pagina Web dell'interfaccia utente di Hadoop. Questa particolare applicazione non ha risorse assegnate e lo stato di avanzamento è dello 0%. Non sono sicuro di cosa ho fatto di sbagliato.
Di seguito sono riportate ulteriori informazioni.
- I parametri della connessione db sql sono corretti e ho verificato la connessione dal lato Hadoop.
- The Hive funziona bene. Sono stato in grado di creare un database o una tabella nell'alveare.
- L'intero sistema Hadoop è in VirtualBox sul mio laptop. Il nodo principale ha memoria 4G e il nodo dati ha memoria 1G.
Di seguito sono riportate le uniche configurazioni relative alla memoria che ho fatto su Hadoop. Non sono sicuro che il problema sia legato alla memoria e l'ho pubblicato per ogni evenienza.
vi mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>256</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>128</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>128</value> </property>
Non ho mai visto ridurre i progressi della mappa, come% di mappa e% di riduzione.
La mia installazione non aveva HBASE, HCatalog, Accumulo o zookeeper. Non penso di averne bisogno, ma potrei sbagliarmi.
Di seguito sono riportati tutti i messaggi di esecuzione che ho ricevuto da Sqoop.
Warning: /home/admin1/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/admin1/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/admin1/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/admin1/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/01/28 09:56:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/01/28 09:56:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/01/28 09:56:12 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/01/28 09:56:12 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/01/28 09:56:12 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manage r). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
19/01/28 09:56:12 INFO manager.SqlManager: Using default fetchSize of 1000
19/01/28 09:56:12 INFO tool.CodeGenTool: Beginning code generation
19/01/28 09:56:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM DimEmployee AS t WHERE 1=0
19/01/28 09:56:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM DimEmployee AS t WHERE 1=0
19/01/28 09:56:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/admin1/hadoop
Note: /tmp/sqoop-admin1/compile/e8e0b042e5ecc16c39484556762dae8a/DimEmployee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/01/28 09:56:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin1/compile/e8e0b042e5ecc16c39484556762dae8a/DimEmployee.jar
19/01/28 09:56:18 INFO mapreduce.ImportJobBase: Beginning import of DimEmployee
19/01/28 09:56:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/01/28 09:56:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM DimEmployee AS t WHERE 1=0
19/01/28 09:56:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/01/28 09:56:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/01/28 09:56:29 INFO db.DBInputFormat: Using read commited transaction isolation
19/01/28 09:56:29 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(EmployeeKey), MAX(EmployeeKey) FROM DimEmployee
19/01/28 09:56:29 INFO db.IntegerSplitter: Split size: 73; Num splits: 4 from: 1 to: 296
19/01/28 09:56:29 INFO mapreduce.JobSubmitter: number of splits:4
19/01/28 09:56:29 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/28 09:56:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1548697949348_0001
19/01/28 09:56:31 INFO impl.YarnClientImpl: Submitted application application_1548697949348_0001
19/01/28 09:56:31 INFO mapreduce.Job: The url to track the job: http://name1:8088/proxy/application_1548697949348_0001/
19/01/28 09:56:31 INFO mapreduce.Job: Running job: job_1548697949348_0001
sqoop import --connect "jdbc:sqlserver://192.168.10.101:1433;database=Testdb" --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --username abc--password abc --table "DimEmployee" --create-hive-table --hive-import --hive-table DboDimEmployee