Is there a .NET equivalent to Apache Hadoop?

C#.NetHadoopMapreduce

C# Problem Overview


So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler.

My only minor issue is I'm a C# developer and it's in Java.

It's not that I don't understand the Java as much as I'm looking for the Hadoop.net or NHadoop or the .NET project that embraces the Google MapReduce approach. Does anyone know of one?

C# Solutions


Solution 1 - C#

Have you looked at using Hadoop's streaming?

I use it in python all the time :-).

I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.

If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.

Solution 2 - C#

Recently, MySpace released their .NET MapReduce framework, Qizmt, as Open Source, so this is also a potential contender in this space.

Solution 3 - C#

Solution 4 - C#

I answered your question in my question here

To say it here in the source:

Microsoft dropped its alternative (Dryad) in favor of Hadoop. Next year they will release MS SQL Server 2012 with Hadoop integration. Azure and Windows Sever support is being developed even as we speak.

It will be available in the first half in 2012.

Hadoop is the #1 BigData platform and is going to be supported by opensource and proprietary source (Java, .Net, Python, ...) even Oracle is adopting it.

If you were developing something, you should wait if you're on the .Net platform.

More information about what is possible will be available here

Solution 5 - C#

I would say that DryadLinq is the closest thing that us .NET folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system then DryadLINQ isn't what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.

That being said, if its the distributed execution aspect of Hadoop that you are looking for than DryadLINQ is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPC cluster setup than getting going with DryadLINQ is really easy.

The code you write is really just straight LINQ code, except instead of executing the LINQ on IEnumerable<T> you have to execute it on PartitionedTable<T> (the self build distributed data structure).

What has really been cool about DryadLINQ is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLINQ will take care of the whole distributed execution part. It's the most natural analog I've come across that makes writing code for distributed processing just like writing code for single process processing.

Solution 6 - C#

You can look into something like RavenDb it provides very decent support for MapReduce for a fairly large size of data. as it is built in .Net so a proper LINQ client API is available.

http://ravendb.net/

To get you started you can read my blog entery.

Solution 7 - C#

It may be better to use Apache Hadoop and streaming because Apache Hadoop is actively being developed and maintained by big giants in the Industry like Yahoo and Facebook. So it can do what you expect it to do.

If you need a solution in .NET please check Myspace implementation @ [MySpace Qizmt - MySpace’s Open Source Mapreduce Framework][1]

[1]: http://code.google.com/p/qizmt/ "MySpace Qizmt - MySpace’s Open Source Mapreduce Framework"

Solution 8 - C#

Microsoft is in the process of rolling out HDInsight, which is billed as their "100% Apache compatible Hadoop distribution."

It is available both on Windows Server and as a Windows Azure service.

Solution 9 - C#

Microsoft Research has project Daytona http://research.microsoft.com/en-us/projects/daytona/

You can download it. There's a WordCount sample in C#.

Solution 10 - C#

You can now use Hadoop directly from .NET Microsoft has release a SDK to do so.

https://hadoopsdk.codeplex.com/

Of course this means using the java based Hadoop network. But does it matter if the server is running in java? I am sure someone may attempt to port it but I don't think it would be a good idea as corporations are already backing the java version and I don't think the .NET port will get the same attention.

Solution 11 - C#

Have a look on:

http://www.windowsazure.com/en-us/services/hdinsight/

It is an implementation of Hadoop for Azure and you can use .NET for accessing it.

Solution 12 - C#

Internally, Microsoft have been using Cosmos. This has been made available outside Microsoft thru Azure. It's named Azure Data Lake Analytics and Azure Data Lake Store. Azure Data Lake analytics is kind of Yarn as a service and Azure Data Lake Store WebHDFS as a service. The first version of Azure Data Lake Analytics only hosts U-SQL a language based on Transact-SQL + C#.

Solution 13 - C#

There's a pretty cute MapReduce implementation for .NET at: http://mapsharp.codeplex.com/

Solution 14 - C#

dryad/linq is being productized and will be released soon: http://blogs.technet.com/b/windowshpc/archive/2011/07/07/announcing-linq-to-hpc-beta-2.aspx use in conjunction with Microsoft HPC for a powerful, cluster based solution for quering unstructured data

Solution 15 - C#

As others have mentioned, DryadLINQ is a programming framework that allows developers to write LINQ queries and execute them on a cluster, in a similar manner to MapReduce. The DryadLINQ project has recently been released under the Apache license on GitHub, and the release includes support for running on YARN clusters (including Azure HDInsight clusters).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondanswainView Question on Stackoverflow
Solution 1 - C#chewsView Answer on Stackoverflow
Solution 2 - C#foxxtrotView Answer on Stackoverflow
Solution 3 - C#slyiView Answer on Stackoverflow
Solution 4 - C#NicoJuicyView Answer on Stackoverflow
Solution 5 - C#TurboView Answer on Stackoverflow
Solution 6 - C#OvaisView Answer on Stackoverflow
Solution 7 - C#Dileep stanleyView Answer on Stackoverflow
Solution 8 - C#BuggieboyView Answer on Stackoverflow
Solution 9 - C#benjguinView Answer on Stackoverflow
Solution 10 - C#DreamwalkerView Answer on Stackoverflow
Solution 11 - C#Stefan PappView Answer on Stackoverflow
Solution 12 - C#benjguinView Answer on Stackoverflow
Solution 13 - C#ZigView Answer on Stackoverflow
Solution 14 - C#JohnView Answer on Stackoverflow
Solution 15 - C#mrryView Answer on Stackoverflow