hadoop-hdfs-client splitoff is going to break code

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

hadoop-hdfs-client splitoff is going to break code

Steve Loughran-3
just an FYI, the split off of hadoop hdfs into client and server is going to break things.

I know that, as my code is broken; DFSConfigKeys off the path, HdfsConfiguration, the class I've been loading to force pickup of hdfs-site.xml -all missing.

This is because hadoop-client  POM now depends on hadoop-hdfs-client, not hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad about DfsConfigKeys, as everybody uses it as the one hard-coded resource of HDFS constants, HDFS-6566 covering the issue of making this public, something that's been sitting around for a year.

I'm fixing my build by explicitly adding a hadoop-hdfs dependency.

Any application which used stuff which has now been declared server-side isn't going to compile any more, which does appear to break the compatibility guidelines we've adopted, specifically "The hadoop-client artifact (maven groupId:artifactId) stays compatible within a major release"

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts


We need to do one of

1. agree that this change, is considered acceptable according to policy, and mark it as incompatible in hdfs/CHANGES.TXT
2. Change the POMs to add both hdfs-client and -hdfs server in hadoop-client -with downstream users free to exclude the server code

We unintentionally caused similar grief with the move of the s3n clients to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd. This time we know the problems going to arise, so lets explicitly make a decision this time, and share it with our users.

-steve
Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

larry mccay-2
Interesting...

As long as #2 provides full backward compatibility and the ability to
explicitly exclude the server dependencies that seems the best way to go.
That would get my non-binding +1.
:)

Perhaps we could add another artifact called hadoop-thin-client that would
not be backward compatible at some point?

On Wed, Oct 14, 2015 at 1:36 PM, Steve Loughran <[hidden email]>
wrote:

> just an FYI, the split off of hadoop hdfs into client and server is going
> to break things.
>
> I know that, as my code is broken; DFSConfigKeys off the path,
> HdfsConfiguration, the class I've been loading to force pickup of
> hdfs-site.xml -all missing.
>
> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
> HDFS constants, HDFS-6566 covering the issue of making this public,
> something that's been sitting around for a year.
>
> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>
> Any application which used stuff which has now been declared server-side
> isn't going to compile any more, which does appear to break the
> compatibility guidelines we've adopted, specifically "The hadoop-client
> artifact (maven groupId:artifactId) stays compatible within a major release"
>
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>
>
> We need to do one of
>
> 1. agree that this change, is considered acceptable according to policy,
> and mark it as incompatible in hdfs/CHANGES.TXT
> 2. Change the POMs to add both hdfs-client and -hdfs server in
> hadoop-client -with downstream users free to exclude the server code
>
> We unintentionally caused similar grief with the move of the s3n clients
> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
> This time we know the problems going to arise, so lets explicitly make a
> decision this time, and share it with our users.
>
> -steve
>
Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Haohui Mai-3
Option 2 sounds good to me. It might make sense to make hadoop-client
directly depends on Hadoop-hdfs?


Haohui
On Wed, Oct 14, 2015 at 10:56 AM larry mccay <[hidden email]> wrote:

> Interesting...
>
> As long as #2 provides full backward compatibility and the ability to
> explicitly exclude the server dependencies that seems the best way to go.
> That would get my non-binding +1.
> :)
>
> Perhaps we could add another artifact called hadoop-thin-client that would
> not be backward compatible at some point?
>
> On Wed, Oct 14, 2015 at 1:36 PM, Steve Loughran <[hidden email]>
> wrote:
>
> > just an FYI, the split off of hadoop hdfs into client and server is going
> > to break things.
> >
> > I know that, as my code is broken; DFSConfigKeys off the path,
> > HdfsConfiguration, the class I've been loading to force pickup of
> > hdfs-site.xml -all missing.
> >
> > This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> > hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> > about DfsConfigKeys, as everybody uses it as the one hard-coded resource
> of
> > HDFS constants, HDFS-6566 covering the issue of making this public,
> > something that's been sitting around for a year.
> >
> > I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
> >
> > Any application which used stuff which has now been declared server-side
> > isn't going to compile any more, which does appear to break the
> > compatibility guidelines we've adopted, specifically "The hadoop-client
> > artifact (maven groupId:artifactId) stays compatible within a major
> release"
> >
> >
> >
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
> >
> >
> > We need to do one of
> >
> > 1. agree that this change, is considered acceptable according to policy,
> > and mark it as incompatible in hdfs/CHANGES.TXT
> > 2. Change the POMs to add both hdfs-client and -hdfs server in
> > hadoop-client -with downstream users free to exclude the server code
> >
> > We unintentionally caused similar grief with the move of the s3n clients
> > to hadoop-aws , HADOOP-11074 -something we should have picked up and
> -1'd.
> > This time we know the problems going to arise, so lets explicitly make a
> > decision this time, and share it with our users.
> >
> > -steve
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Ted Yu-3
In reply to this post by larry mccay-2
+1 on option 2.

On Wed, Oct 14, 2015 at 10:56 AM, larry mccay <[hidden email]> wrote:

> Interesting...
>
> As long as #2 provides full backward compatibility and the ability to
> explicitly exclude the server dependencies that seems the best way to go.
> That would get my non-binding +1.
> :)
>
> Perhaps we could add another artifact called hadoop-thin-client that would
> not be backward compatible at some point?
>
> On Wed, Oct 14, 2015 at 1:36 PM, Steve Loughran <[hidden email]>
> wrote:
>
> > just an FYI, the split off of hadoop hdfs into client and server is going
> > to break things.
> >
> > I know that, as my code is broken; DFSConfigKeys off the path,
> > HdfsConfiguration, the class I've been loading to force pickup of
> > hdfs-site.xml -all missing.
> >
> > This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> > hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> > about DfsConfigKeys, as everybody uses it as the one hard-coded resource
> of
> > HDFS constants, HDFS-6566 covering the issue of making this public,
> > something that's been sitting around for a year.
> >
> > I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
> >
> > Any application which used stuff which has now been declared server-side
> > isn't going to compile any more, which does appear to break the
> > compatibility guidelines we've adopted, specifically "The hadoop-client
> > artifact (maven groupId:artifactId) stays compatible within a major
> release"
> >
> >
> >
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
> >
> >
> > We need to do one of
> >
> > 1. agree that this change, is considered acceptable according to policy,
> > and mark it as incompatible in hdfs/CHANGES.TXT
> > 2. Change the POMs to add both hdfs-client and -hdfs server in
> > hadoop-client -with downstream users free to exclude the server code
> >
> > We unintentionally caused similar grief with the move of the s3n clients
> > to hadoop-aws , HADOOP-11074 -something we should have picked up and
> -1'd.
> > This time we know the problems going to arise, so lets explicitly make a
> > decision this time, and share it with our users.
> >
> > -steve
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Mingliang Liu
In reply to this post by Steve Loughran-3
The jira tracking this issue is: https://issues.apache.org/jira/browse/HDFS-9241

 +1 on option 2

I think it makes sense to make hadoop-client directly depend on hadoop-hdfs (which itself depends on hadoop-hdfs-client).

Ciao,

Mingliang Liu
Member of Technical Staff - HDFS,
Hortonworks Inc.
[hidden email]



> On Oct 14, 2015, at 10:36 AM, Steve Loughran <[hidden email]> wrote:
>
> just an FYI, the split off of hadoop hdfs into client and server is going to break things.
>
> I know that, as my code is broken; DFSConfigKeys off the path, HdfsConfiguration, the class I've been loading to force pickup of hdfs-site.xml -all missing.
>
> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad about DfsConfigKeys, as everybody uses it as the one hard-coded resource of HDFS constants, HDFS-6566 covering the issue of making this public, something that's been sitting around for a year.
>
> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>
> Any application which used stuff which has now been declared server-side isn't going to compile any more, which does appear to break the compatibility guidelines we've adopted, specifically "The hadoop-client artifact (maven groupId:artifactId) stays compatible within a major release"
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>
>
> We need to do one of
>
> 1. agree that this change, is considered acceptable according to policy, and mark it as incompatible in hdfs/CHANGES.TXT
> 2. Change the POMs to add both hdfs-client and -hdfs server in hadoop-client -with downstream users free to exclude the server code
>
> We unintentionally caused similar grief with the move of the s3n clients to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd. This time we know the problems going to arise, so lets explicitly make a decision this time, and share it with our users.
>
> -steve

Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Colin McCabe-3
In reply to this post by Steve Loughran-3
Thanks for being proactive here, Steve.  I think this is a good example of
why this change should have been done in a branch rather than having been
done directly in trunk.

regards,
Colin


On Wed, Oct 14, 2015 at 10:36 AM, Steve Loughran <[hidden email]>
wrote:

> just an FYI, the split off of hadoop hdfs into client and server is going
> to break things.
>
> I know that, as my code is broken; DFSConfigKeys off the path,
> HdfsConfiguration, the class I've been loading to force pickup of
> hdfs-site.xml -all missing.
>
> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
> HDFS constants, HDFS-6566 covering the issue of making this public,
> something that's been sitting around for a year.
>
> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>
> Any application which used stuff which has now been declared server-side
> isn't going to compile any more, which does appear to break the
> compatibility guidelines we've adopted, specifically "The hadoop-client
> artifact (maven groupId:artifactId) stays compatible within a major release"
>
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>
>
> We need to do one of
>
> 1. agree that this change, is considered acceptable according to policy,
> and mark it as incompatible in hdfs/CHANGES.TXT
> 2. Change the POMs to add both hdfs-client and -hdfs server in
> hadoop-client -with downstream users free to exclude the server code
>
> We unintentionally caused similar grief with the move of the s3n clients
> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
> This time we know the problems going to arise, so lets explicitly make a
> decision this time, and share it with our users.
>
> -steve
>
Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Steve Loughran-3

> On 19 Oct 2015, at 22:01, Colin P. McCabe <[hidden email]> wrote:
>
> Thanks for being proactive here, Steve.

no, just building downstream things. Caught a failure of spark to build against trunk too, but that's a one liner to import the no-deprecated Auth Exception

>  I think this is a good example of
> why this change should have been done in a branch rather than having been
> done directly in trunk.

Given the size of the change, I'm now convincedt that yes, the hadoop-client split should have been in a branch. What it offers there is the ability to choose when to merge in. As it is, any Hadoop 2.8 release will have this feature. It's going to be visible, and that's going to add more testing. We should expect this to cause things to surface in the release process. We also need to consider what's going to be the policy if 2.8.0 turns out to break something: what are we prepared to roll back?

Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Kihwal Lee-2
In reply to this post by Colin McCabe-3
I am not sure whether it was mentioned by anyone before, butI noticed that client only changes do not trigger running anytest in hdfs-precommit. This is because hadoop-hdfs-client does nothave any test.
Kihwal

      From: Colin P. McCabe <[hidden email]>
 To: "[hidden email]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
 Sent: Monday, October 19, 2015 4:01 PM
 Subject: Re: hadoop-hdfs-client splitoff is going to break code
   
Thanks for being proactive here, Steve.  I think this is a good example of
why this change should have been done in a branch rather than having been
done directly in trunk.

regards,
Colin




On Wed, Oct 14, 2015 at 10:36 AM, Steve Loughran <[hidden email]>
wrote:

> just an FYI, the split off of hadoop hdfs into client and server is going
> to break things.
>
> I know that, as my code is broken; DFSConfigKeys off the path,
> HdfsConfiguration, the class I've been loading to force pickup of
> hdfs-site.xml -all missing.
>
> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
> HDFS constants, HDFS-6566 covering the issue of making this public,
> something that's been sitting around for a year.
>
> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>
> Any application which used stuff which has now been declared server-side
> isn't going to compile any more, which does appear to break the
> compatibility guidelines we've adopted, specifically "The hadoop-client
> artifact (maven groupId:artifactId) stays compatible within a major release"
>
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>
>
> We need to do one of
>
> 1. agree that this change, is considered acceptable according to policy,
> and mark it as incompatible in hdfs/CHANGES.TXT
> 2. Change the POMs to add both hdfs-client and -hdfs server in
> hadoop-client -with downstream users free to exclude the server code
>
> We unintentionally caused similar grief with the move of the s3n clients
> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
> This time we know the problems going to arise, so lets explicitly make a
> decision this time, and share it with our users.
>
> -steve
>


Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Haohui Mai-3
All tests that need to spin up a MiniDFSCluster will need to stay in
hadoop-hdfs. Other client only tests are being moved to the
hadoop-hdfs-client module, which is tracked in HDFS-9168.

~Haohui

On Fri, Oct 23, 2015 at 2:14 PM, Kihwal Lee
<[hidden email]> wrote:

> I am not sure whether it was mentioned by anyone before, butI noticed that client only changes do not trigger running anytest in hdfs-precommit. This is because hadoop-hdfs-client does nothave any test.
> Kihwal
>
>       From: Colin P. McCabe <[hidden email]>
>  To: "[hidden email]" <[hidden email]>
> Cc: "[hidden email]" <[hidden email]>
>  Sent: Monday, October 19, 2015 4:01 PM
>  Subject: Re: hadoop-hdfs-client splitoff is going to break code
>
> Thanks for being proactive here, Steve.  I think this is a good example of
> why this change should have been done in a branch rather than having been
> done directly in trunk.
>
> regards,
> Colin
>
>
>
>
> On Wed, Oct 14, 2015 at 10:36 AM, Steve Loughran <[hidden email]>
> wrote:
>
>> just an FYI, the split off of hadoop hdfs into client and server is going
>> to break things.
>>
>> I know that, as my code is broken; DFSConfigKeys off the path,
>> HdfsConfiguration, the class I've been loading to force pickup of
>> hdfs-site.xml -all missing.
>>
>> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
>> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
>> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
>> HDFS constants, HDFS-6566 covering the issue of making this public,
>> something that's been sitting around for a year.
>>
>> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>>
>> Any application which used stuff which has now been declared server-side
>> isn't going to compile any more, which does appear to break the
>> compatibility guidelines we've adopted, specifically "The hadoop-client
>> artifact (maven groupId:artifactId) stays compatible within a major release"
>>
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>>
>>
>> We need to do one of
>>
>> 1. agree that this change, is considered acceptable according to policy,
>> and mark it as incompatible in hdfs/CHANGES.TXT
>> 2. Change the POMs to add both hdfs-client and -hdfs server in
>> hadoop-client -with downstream users free to exclude the server code
>>
>> We unintentionally caused similar grief with the move of the s3n clients
>> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
>> This time we know the problems going to arise, so lets explicitly make a
>> decision this time, and share it with our users.
>>
>> -steve
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: hadoop-hdfs-client splitoff is going to break code

Kihwal Lee-2
I think a lot of "client-side" tests use MiniDFSCluster. I know mechanical division is possible, but what about test coverage?
Kihwal

      From: Haohui Mai <[hidden email]>
 To: [hidden email]; Kihwal Lee <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
 Sent: Friday, October 23, 2015 4:43 PM
 Subject: Re: hadoop-hdfs-client splitoff is going to break code
   
All tests that need to spin up a MiniDFSCluster will need to stay in
hadoop-hdfs. Other client only tests are being moved to the
hadoop-hdfs-client module, which is tracked in HDFS-9168.

~Haohui



On Fri, Oct 23, 2015 at 2:14 PM, Kihwal Lee
<[hidden email]> wrote:

> I am not sure whether it was mentioned by anyone before, butI noticed that client only changes do not trigger running anytest in hdfs-precommit. This is because hadoop-hdfs-client does nothave any test.
> Kihwal
>
>      From: Colin P. McCabe <[hidden email]>
>  To: "[hidden email]" <[hidden email]>
> Cc: "[hidden email]" <[hidden email]>
>  Sent: Monday, October 19, 2015 4:01 PM
>  Subject: Re: hadoop-hdfs-client splitoff is going to break code
>
> Thanks for being proactive here, Steve.  I think this is a good example of
> why this change should have been done in a branch rather than having been
> done directly in trunk.
>
> regards,
> Colin
>
>
>
>
> On Wed, Oct 14, 2015 at 10:36 AM, Steve Loughran <[hidden email]>
> wrote:
>
>> just an FYI, the split off of hadoop hdfs into client and server is going
>> to break things.
>>
>> I know that, as my code is broken; DFSConfigKeys off the path,
>> HdfsConfiguration, the class I've been loading to force pickup of
>> hdfs-site.xml -all missing.
>>
>> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
>> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
>> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
>> HDFS constants, HDFS-6566 covering the issue of making this public,
>> something that's been sitting around for a year.
>>
>> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>>
>> Any application which used stuff which has now been declared server-side
>> isn't going to compile any more, which does appear to break the
>> compatibility guidelines we've adopted, specifically "The hadoop-client
>> artifact (maven groupId:artifactId) stays compatible within a major release"
>>
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>>
>>
>> We need to do one of
>>
>> 1. agree that this change, is considered acceptable according to policy,
>> and mark it as incompatible in hdfs/CHANGES.TXT
>> 2. Change the POMs to add both hdfs-client and -hdfs server in
>> hadoop-client -with downstream users free to exclude the server code
>>
>> We unintentionally caused similar grief with the move of the s3n clients
>> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
>> This time we know the problems going to arise, so lets explicitly make a
>> decision this time, and share it with our users.
>>
>> -steve
>>
>
>
>