Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS integration test failure when using Accumulo 2.0.1 with Kerberos #3134

Open
GCHQDeveloper314 opened this issue Dec 12, 2023 · 0 comments
Labels
accumulo-store Specific to/touches the accumulo-store module bug Confirmed or suspected bug
Milestone

Comments

@GCHQDeveloper314
Copy link
Member

GCHQDeveloper314 commented Dec 12, 2023

Describe the bug
When running the AddElementsFromHdfsLoaderIT integration tests (which are run as part of the Accumulo Store) against an Accumulo 2.0.1 cluster configured to use Kerberos, these tests fail with an Accumulo error relating to being unable to "rename files across volumes". The full message is shown below and originates in this line of Accumulo code.

It isn't clear if the error is correct and if the rename really is across volumes or not. This error only occurs with Accumulo 2.0.1 and Kerberos. It doesn't occur with Accumulo 1.9.3, nor with 2.0.1 without Kerberos. The rename is the same regardless (see additional info), which may indicate a problem with Accumulo itself. This rename should either be allowed in all situations or in none, using Kerberos or not shouldn't have any effect.

To debug this issue, investigate the FileSystem objects here (set with this method, which uses code calling this method).

Ideally any Gaffer code changes should be done after #2743 which is about improving the documentation for these tests.

Expected behaviour
These tests should work for Accumulo 2+ the same as they do for Accumulo 1.

Stack trace and errors

Log extract from integration tests container (irrelevant info removed):

2023-12-12 11:47:58 integration.loader.AddElementsFromHdfsLoaderIT$AddElementsFromHdfsLoader INFO  - using root dir: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843
...
2023-12-12 11:47:59 accumulostore.utils.TableUtils INFO  - Creating table integrationTestGraph as user gaffer/[email protected]
...
2023-12-12 11:48:01 hdfs.handler.AddElementsFromHdfsHandler INFO  - Checking that the correct HDFS directories exist
2023-12-12 11:48:01 hdfs.handler.AddElementsFromHdfsHandler INFO  - Ensuring output directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir doesn't exist
...
2023-12-12 11:48:01 job.tool.AddElementsFromHdfsTool INFO  - Adding elements from HDFS
...
2023-12-12 11:48:01 job.factory.AccumuloAddElementsFromHdfsJobFactory INFO  - Creating splits file in location hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/splitsDir/splits from table integrationTestGraph
...
2023-12-12 11:48:03 hadoop.mapred.MapTask INFO  - Processing split: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/inputDir3/file.txt:0+21422
...
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO  - Ensuring failure directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/failureDir exists
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO  - Failure directory doesn't exist so creating: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/failureDir
2023-12-12 11:48:06 accumulostore.utils.IngestUtils INFO  - Setting permission rwxrwxrwx on directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/failureDir and all files within
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO  - Removing file hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir/_SUCCESS
2023-12-12 11:48:06 accumulostore.utils.IngestUtils INFO  - Setting permission rwxrwxrwx on directory hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir and all files within
2023-12-12 11:48:06 job.tool.ImportElementsToAccumuloTool INFO  - Importing files in hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir to table integrationTestGraph
...
2023-12-12 11:48:06 hdfs.handler.AddElementsFromHdfsHandler ERROR  - Failed to import elements into Accumulo: Internal error processing waitForFateOperation

Stack trace for the above:

uk.gov.gchq.gaffer.operation.OperationException: Failed to import elements into Accumulo
     at uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.AddElementsFromHdfsHandler.importElements(AddElementsFromHdfsHandler.java:234)
...
Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForFateOperation
    at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:388)
    at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:342)
    at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doTableFateOperation(TableOperationsImpl.java:1599)
    at org.apache.accumulo.core.clientImpl.TableOperationsImpl.importDirectory(TableOperationsImpl.java:1207)
    at uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.job.tool.ImportElementsToAccumuloTool.run(ImportElementsToAccumuloTool.java:78)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:95)
    at uk.gov.gchq.gaffer.accumulostore.operation.hdfs.handler.AddElementsFromHdfsHandler.importElements(AddElementsFromHdfsHandler.java:230)
    ... 119 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForFateOperation

Log extract from accumulo-master container:

2023-12-12 11:48:06,899 [thrift.ProcessFunction] ERROR: Internal error processing waitForFateOperation
java.lang.UnsupportedOperationException: Cannot rename files across volumes: hdfs://hdfs-namenode.gaffer:9000/tmp/junit8641700344797179843/outputDir/part-r-00000.rf -> hdfs://hdfs-namenode.gaffer:9000/accumulo/tables/2/b-000003z/I0000040.rf
    at org.apache.accumulo.server.fs.VolumeManagerImpl.rename(VolumeManagerImpl.java:319)
    at org.apache.accumulo.master.tableOps.bulkVer1.BulkImport.lambda$prepareBulkImport$0(BulkImport.java:251)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
    at java.lang.Thread.run(Thread.java:750)

Platform

  • Gaffer Version: 2.0.0

Additional context
Looking at the namenode logs it's possible to see the relevant files being created and renamed by the HDFS integration tests. This also shows how Accumulo makes a new directory, looks at the directory of the file to import, but then (having issued the error above) removes the new directory it created:

2023-12-12 11:48:04 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=create      src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0/part-r-00000.rf    dst=null        perm=gaffer:supergroup:rw-r--r--        proto=rpc
...
2023-12-12 11:48:05 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0    dst=null        perm=null       proto=rpc
2023-12-12 11:48:05 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 11:48:05 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=listStatus  src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0    dst=null        perm=null       proto=rpc
2023-12-12 11:48:05 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir/part-r-00000.rf    dst=null        perm=null       proto=rpc
2023-12-12 11:48:05 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=rename      src=/tmp/junit8641700344797179843/outputDir/_temporary/0/_temporary/attempt_local1986856988_0001_r_000000_0/part-r-00000.rf    dst=/tmp/junit8641700344797179843/outputDir/part-r-00000.rf     perm=gaffer:supergroup:rw-r--r--      proto=rpc
2023-12-12 11:48:05 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=delete      src=/tmp/junit8641700344797179843/outputDir/_temporary dst=null        perm=null       proto=rpc
...
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=setPermission       src=/tmp/junit8641700344797179843/outputDir    dst=null        perm=gaffer:supergroup:rwxrwxrwx        proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=listStatus  src=/tmp/junit8641700344797179843/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=setPermission       src=/tmp/junit8641700344797179843/outputDir/part-r-00000.rf    dst=null        perm=gaffer:supergroup:rwxrwxrwx        proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=getfileinfo src=/tmp/junit8641700344797179843/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=getfileinfo src=/tmp/junit8641700344797179843/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.26.0.2  cmd=listStatus  src=/tmp/junit8641700344797179843/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=getfileinfo src=/tmp/junit8641700344797179843/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=listStatus  src=/tmp/junit8641700344797179843/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=mkdirs      src=/accumulo/tables/2    dst=null perm=accumulo:supergroup:rwxr-xr-x      proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=getfileinfo src=/accumulo/tables/2/b-000003z       dst=null        perm=null       proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=mkdirs      src=/accumulo/tables/2/b-000003z       dst=null        perm=accumulo:supergroup:rwxr-xr-x      proto=rpc
2023-12-12 11:48:06 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=listStatus  src=/tmp/junit8641700344797179843/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 11:48:07 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.26.0.7  cmd=delete      src=/accumulo/tables/2    dst=null perm=null       proto=rpc

These are the expected operations (taken from an ITs run using Accumulo 1.9.3) showing the successful renaming of the file by accumulo:

2023-12-12 13:30:46 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.20.0.3  cmd=setPermission       src=/tmp/junit8460692230461269805/outputDir/part-r-00000.rf    dst=null        perm=gaffer:supergroup:rwxrwxrwx        proto=rpc
2023-12-12 13:30:46 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.20.0.3  cmd=getfileinfo src=/tmp/junit8460692230461269805/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 13:30:46 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.20.0.3  cmd=getfileinfo src=/tmp/junit8460692230461269805/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 13:30:46 INFO  audit:8145 - allowed=true ugi=gaffer/[email protected] (auth:KERBEROS)      ip=/172.20.0.3  cmd=listStatus  src=/tmp/junit8460692230461269805/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=getfileinfo src=/tmp/junit8460692230461269805/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=listStatus  src=/tmp/junit8460692230461269805/failureDir   dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=mkdirs      src=/accumulo/tables/2   dst=null perm=accumulo:supergroup:rwxr-xr-x      proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=getfileinfo src=/accumulo/tables/2/b-000005c       dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=mkdirs      src=/accumulo/tables/2/b-000005c       dst=null        perm=accumulo:supergroup:rwxr-xr-x      proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=listStatus  src=/tmp/junit8460692230461269805/outputDir    dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=rename      src=/tmp/junit8460692230461269805/outputDir/part-r-00000.rf    dst=/accumulo/tables/2/b-000005c/I000005d.rf    perm=gaffer:supergroup:rwxrwxrwx        proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)       ip=/172.20.0.7  cmd=listStatus  src=/accumulo/tables/2/b-000005c       dst=null        perm=null       proto=rpc
...
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)      ip=/172.20.0.10 cmd=open        src=/accumulo/tables/2/b-000005c/I000005d.rf   dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)      ip=/172.20.0.10 cmd=getfileinfo src=/accumulo/tables/2/b-000005c/I000005d.rf   dst=null        perm=null       proto=rpc
2023-12-12 13:30:47 INFO  audit:8145 - allowed=true ugi=accumulo/[email protected] (auth:KERBEROS)      ip=/172.20.0.10 cmd=contentSummary      src=/accumulo/tables/2/b-000005c/I000005d.rf   dst=null        perm=null       proto=rpc
@GCHQDeveloper314 GCHQDeveloper314 added bug Confirmed or suspected bug accumulo-store Specific to/touches the accumulo-store module labels Dec 12, 2023
@GCHQDeveloper314 GCHQDeveloper314 added this to the Backlog milestone Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accumulo-store Specific to/touches the accumulo-store module bug Confirmed or suspected bug
Projects
None yet
Development

No branches or pull requests

1 participant