The Big Data Show
The Big Data Show
  • Видео 136
  • Просмотров 855 664
Big Data Mock Interview | Data Engineering Interview | First Round of Interview
Data Engineering Mock Interview
Join Nisha, an experienced Data Engineering professional with over 5 years of experience, and Sai Varun Kumar Namburi for an exciting and informative Data Engineering mock interview session.
If you're preparing for a Data Engineering interview, this is the perfect opportunity to enhance your skills and increase your chances of success. The mock interview simulates a real-life interview scenario and provides valuable insights and guidance. The topics covered include #apachespark SQL, #snowflake, ETL pipelines, data modelling, database technologies, cloud platforms, and more. You'll get to see how professionals tackle technical questions and problem-solving cha...
Просмотров: 268

Видео

How to read from APIs in PySpark codebase...
Просмотров 89614 дней назад
PySpark mini project: Dive into the world of big data processing with our PySpark Practice playlist. This series is designed for both beginners and seasoned data professionals looking to sharpen their Apache Spark skills through scenario-based questions and challenges. Not all the inputs come from storage files like JSON, CSV and other formats. There can be cases where you are given a scenario ...
Data Engineering Interview at top product based company | First Round
Просмотров 3,9 тыс.21 день назад
Data Engineering Mock Interview In top product-based companies like #meta #amazon #google #netflix etc, the first round of Data Engineering Interviews checks problem-solving skills. It mostly consists of screen-sharing sessions, where candidates are expected to solve multiple SQL and DSA problems, particularly in #python. We have tried to replicate the same things by asking multiple good SQL an...
First round of Data Engineering Interview at product based company
Просмотров 1
Data Engineering Mock Interview Join a Staff Data Engineer & Senior Data Engineer for a wonderful Data Engineering Mock Interview. If you're preparing for a Data Engineering interview, this is the perfect opportunity to enhance your skills and increase your chances of success. The mock interview simulates a real-life scenario and provides valuable insights and guidance. It includes discussion o...
What is topic, partition and offset in Kafka?
Просмотров 464Месяц назад
This is the third video of our "Kafka for Data Engineers" playlist. In this video, we have tried understanding the topic, partition and offset Apache Kafka in depth. Understanding and imagining Apache Kafka at its core is very important to understand its concept deeply. Stay tuned to all to this playlist for all upcoming videos. 𝗝𝗼𝗶𝗻 𝗺𝗲 𝗼𝗻 𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮: 🔅 Topmate (For collaboration and Scheduli...
Brokers in Apache Kafka | Replication factor & ISR in Kafka
Просмотров 282Месяц назад
This is the fourth video of our "Kafka for Data Engineers" playlist. In this video, we have tried to understand the brokers, replication factor and ISR. Understanding and imagining Apache Kafka at its core is very important to understand its concept deeply. Stay tuned to all to this playlist for all upcoming videos. 𝗝𝗼𝗶𝗻 𝗺𝗲 𝗼𝗻 𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮: 🔅 Topmate (For collaboration and Scheduling calls) - t...
Job, Stage and Task in Apache Spark | PySpark interview questions
Просмотров 946Месяц назад
In this video, we explain the concept of Job, Stage and Task in Apache Spark or PySpark. We have gone in-depth to help you understand the topic, but it's important to remember that theory alone may not be enough. To reinforce your knowledge, we've created many problems for you to practice on the same topic in the community section of our RUclips channel. You can find a link to all the questions...
Unlocking Apache Kafka: The Secret Sauce of Event Streaming
Просмотров 622Месяц назад
This is the second video of our "Apache Kafka for Data Engineers" playlist. In this video, we have tried understanding Apache Kafka in brief and then we have tried understanding the real meaning of event & event streaming. Understanding and imagining Apache Kafka at its core is very important to understand its concept deeply. Stay tuned to all to this playlist for all upcoming videos. 𝗝𝗼𝗶𝗻 𝗺𝗲 𝗼...
Unleashing #kafka Magic: What Data Engineers Do with Apache Kafka?
Просмотров 1,3 тыс.Месяц назад
This is the first video of our "Apache Kafka for Data Engineers" playlist. In this video, we have tried discussing one real use case or big data pipeline involving Kafka which is often used in the E-Commerce industry like Amazon, Walmart etc. It is very important to understand some of the real use cases of Apache Kafka in the Data Engineering domain. I hope this video will set up the tone for t...
Repartition vs. Coalesce in Apache Spark | PySpark interview questions
Просмотров 495Месяц назад
During a Data Engineering interview, you may be asked about concepts related to #apachespark . In this video, we explain the difference between Repartition and Coalece in Apache Spark or PySpark. We go in-depth to help you understand the topic, but it's important to remember that theory alone may not be enough. To reinforce your knowledge, we've created over ten problems for you to practice on ...
Apache Spark End-To-End Data Engineering Project | Apple Data Analysis
Просмотров 21 тыс.Месяц назад
Dive into the world of big data processing with our PySpark Practice playlist. This series is designed for both beginners and seasoned data professionals looking to sharpen their Apache Spark skills through scenario-based questions and challenges. Each video provides step-by-step solutions to real-world problems, helping you master PySpark techniques and improve your data-handling capabilities....
Sports Data Analysis using PySpark - Part 02
Просмотров 931Месяц назад
Dive into the world of big data processing with our PySpark Practice playlist. This series is designed for both beginners and seasoned data professionals looking to sharpen their Apache Spark skills through scenario-based questions and challenges. Each video provides step-by-step solutions to real-world problems, helping you master PySpark techniques and improve your data-handling capabilities....
Narrow vs. Wide Transformation in Apache Spark | PySpark interview questions
Просмотров 647Месяц назад
Narrow vs. Wide Transformation in Apache Spark | PySpark interview questions
Sports Data Analysis using PySpark - Part 01
Просмотров 1,3 тыс.Месяц назад
Sports Data Analysis using PySpark - Part 01
Big Data Mock Interview | Data Engineering Interview | First Round of Interview
Просмотров 6 тыс.Месяц назад
Big Data Mock Interview | Data Engineering Interview | First Round of Interview
Data Engineering Interview
Просмотров 4 тыс.2 месяца назад
Data Engineering Interview
Data Engineering Interview | PySpark Questions | Manager behavioural questions
Просмотров 6 тыс.2 месяца назад
Data Engineering Interview | PySpark Questions | Manager behavioural questions
Data Engineering Interview at top product based company | First Round
Просмотров 11 тыс.2 месяца назад
Data Engineering Interview at top product based company | First Round
Big Data Mock Interview | Data Engineering Interview | First Round of Interview
Просмотров 7 тыс.3 месяца назад
Big Data Mock Interview | Data Engineering Interview | First Round of Interview
Big Data Mock Interview | Data Engineering Interview
Просмотров 16 тыс.3 месяца назад
Big Data Mock Interview | Data Engineering Interview
AWS Data Engineering Interview
Просмотров 21 тыс.3 месяца назад
AWS Data Engineering Interview
Data Engineering Interview | System Design
Просмотров 22 тыс.3 месяца назад
Data Engineering Interview | System Design
System Design round of #dataengineering interview
Просмотров 15 тыс.3 месяца назад
System Design round of #dataengineering interview
First round of Big Data Engineering #interview
Просмотров 2,4 тыс.4 месяца назад
First round of Big Data Engineering #interview
System Design round of Data Engineering #interview at top product-based company
Просмотров 41 тыс.4 месяца назад
System Design round of Data Engineering #interview at top product-based company
Big Data Mock Interview | First Round
Просмотров 27 тыс.4 месяца назад
Big Data Mock Interview | First Round
Data Engineering Mock Interview at Top Product Based Companies
Просмотров 10 тыс.5 месяцев назад
Data Engineering Mock Interview at Top Product Based Companies
Data Engineering #mockinterview | Myntra | Part 2
Просмотров 17 тыс.5 месяцев назад
Data Engineering #mockinterview | Myntra | Part 2
Data Engineering Mock Interview | Myntra | Part 1
Просмотров 6 тыс.5 месяцев назад
Data Engineering Mock Interview | Myntra | Part 1
Data Engineering Mock Interview | Myntra
Просмотров 32 тыс.7 месяцев назад
Data Engineering Mock Interview | Myntra

Комментарии

  • @adityeshchaturvedi6553
    @adityeshchaturvedi6553 День назад

    Great explanation Ankur !!

  • @adityeshchaturvedi6553
    @adityeshchaturvedi6553 2 дня назад

    Great video Ankur. Being following your content and blogs via linked-In Congrats !!

  • @dhruvingandhi1114
    @dhruvingandhi1114 2 дня назад

    Hello I am getting error to read delta table that is on default at 01:21:50 IllegalArgumentException: Path must be absolute: default.customer_delta_table_persist.Please help me through that

  • @unknown_fact1586
    @unknown_fact1586 2 дня назад

    Please mention the experience of the interviewee either in caption or in thumbnail. It would be helpful

  • @Ravi-oy8zl
    @Ravi-oy8zl 3 дня назад

    The best playlist who wants to learn Kafka in data engineering domain. Every video has a clear cur explanation. Hope it will be continued. It can be a one stop tutorial for those who wants to learn kafka.

    • @TheBigDataShow
      @TheBigDataShow 3 дня назад

      Thank you for your kind words. I will continue this in a few days. Stuck in my work for many days. Hope I get free time soon

  • @teox4571
    @teox4571 3 дня назад

    thanks!

  • @sandeepmodaliar6980
    @sandeepmodaliar6980 6 дней назад

    The Python Program is an interesting one, assuming a value in list as Store ID and k as the distance or proximity within which another store with the same ID shouldn't exist. We can store a list as value with count,start_indx, end_ indx. If count is 2 we check the diff i.e., end_indx-start_indx <=k or not if it is then return that value and iterate the dict for the others.

  • @tejachillapalli8812
    @tejachillapalli8812 11 дней назад

    dictionary = dict() for index, value in enumerate(nums): if value in dictionary: if abs(dictionary[value] - index) <= k: print(True) break dictionary[value] = index else: print(False)

  • @siddheshchavan2069
    @siddheshchavan2069 12 дней назад

    Great series, can you upload more such videos with complex problems and bigger datasets

    • @TheBigDataShow
      @TheBigDataShow 12 дней назад

      Please check the other video from the same playlist. We have uploaded near to 4 videos

  • @stylishsannigrahi
    @stylishsannigrahi 13 дней назад

    sum_val=0 def sum_of_vals_recursive(my_dict): global sum_val if (type(my_dict)==dict):#processing logic for dictionary for k,v in my_dict.items(): if(type(v)==int or type(v)==float):#if the value is a simple int/float sum_val=sum_val + v else: sum_of_vals_recursive(v)#this will get invoked during 1st nested level else:#this bit of logic is for nested ones, where it is list, set, tuple from 1st nesting for x in my_dict:#iteration logic for set, tuple, list if(type(x)==int or type(x)==float):#type check of elements of the value of each iterated from previous step sum_val = sum_val + x else: sum_of_vals_recursive(x)#this one is for handling next set of nestng and then, it will again follow return sum_val; inp_dict = {'ireland':100,'india':[200,300,[400,[200,200],400]],'uk':{'scotland':[50]}} print(sum_of_vals_recursive(inp_dict))

  • @014_amitdwivedi6
    @014_amitdwivedi6 15 дней назад

    Sir in first pipeline I am getting error that str object has no attribute write

    • @TheBigDataShow
      @TheBigDataShow 14 дней назад

      Share the code snippet where you are getting errors and have you StackOverflow it?

    • @dante421
      @dante421 12 дней назад

      Sir can u please reply to my question ​@@TheBigDataShow

  • @user-rh1hr5cc1r
    @user-rh1hr5cc1r 16 дней назад

    parquet store the file in hybrid format, not column based(80 % true)..

  • @650jitu
    @650jitu 17 дней назад

    Is the next video available ?

    • @TheBigDataShow
      @TheBigDataShow 14 дней назад

      I am a little busy these days. It will released in few days.

    • @RohanKumar-gx3iy
      @RohanKumar-gx3iy 4 дня назад

      @@TheBigDataShow its ok please take your time but please continue the next video you are teaching great and also show some practical implementation of apache kafka along with the theoritical concept it will be very helpful

  • @dante421
    @dante421 18 дней назад

    Will i be able to switch into data engineering after watching and practicing the project ? Will i be able to tell my interview that i done this project in my current company?

    • @TheBigDataShow
      @TheBigDataShow 12 дней назад

      Yes but you have to work hard and learn all the concepts. Just completing one project will not help you to get a job. You have to learn multiple technology and frameworks for getting into Data Engineering domain.

  • @saladilakshminarayana9871
    @saladilakshminarayana9871 19 дней назад

    can you please share the code ,dataset,api end point and also we are excepting one more session on optimal approach for this problem.

  • @VenkatesanVenkat-fd4hg
    @VenkatesanVenkat-fd4hg 20 дней назад

    Can you share data Dataset links?

  • @sarathkumar-tr3is
    @sarathkumar-tr3is 21 день назад

    def distinct_ind(l,k): dict={} for i in range(len(l)): if l[i] in dict: if abs(i - dict[l[i]]) <=k: return True else: dict[l[i]]=i return False

  • @abhiksaha3451
    @abhiksaha3451 21 день назад

    Can you also setup data engineering interviews with respect to GCP ecosystem?

  • @mohitbhandari1106
    @mohitbhandari1106 21 день назад

    I think the first sql can be done using group by as well instead of window function

  • @sarathkumar-tr3is
    @sarathkumar-tr3is 23 дня назад

    2.SQL solution: select name from ( select e.name, DATEDIFF(day,p.promotion_date,l.leave_start) as d_diff from employee e join promotions p on e.employee_id = p.employee_id join leaves l on e.employee_id = l.employee_id) A where d_diff = 1;

    • @kiranmudradi3927
      @kiranmudradi3927 21 день назад

      I think d_diff should be >=1. lets say if an employee got promoted on some date which falls on Friday. From Monday he is taking leave. which has d_diff =2 in this case this recored wont be counted right. Just sharing my thoughts of some edge case.

    • @sarathkumar-tr3is
      @sarathkumar-tr3is 21 день назад

      @@kiranmudradi3927 hey thanks for covering that

  • @sarathkumar-tr3is
    @sarathkumar-tr3is 23 дня назад

    1.SQL solution: select name from ( select e.name,d.department_name, DATEDIFF(day,e.hire_date,p.promotion_date) as day_count, rank() over(partition by d.department_name order by DATEDIFF(day,e.hire_date,p.promotion_date) desc) as Rank from employee e join promotions p on e.employee_id =p.employee_id join departments d on e.department_id = d.department_id) A where rank =1;

  • @DE_Pranav
    @DE_Pranav 23 дня назад

    great questions, thank you for this video

  • @VishalSharma-lz6ky
    @VishalSharma-lz6ky 23 дня назад

    Awesome mock interview And the last question was very good How it saves time if you are reading from disk

  • @sarathkumar-tr3is
    @sarathkumar-tr3is 26 дней назад

    it would be great if you attach the SQL and DSA questions in the comment or description

    • @TheBigDataShow
      @TheBigDataShow 26 дней назад

      Are the questions not clear from the video?

  • @shubhamkashid6919
    @shubhamkashid6919 26 дней назад

    Please break down the video into topics.

  • @vishalbhandari8875
    @vishalbhandari8875 27 дней назад

    WHAT AN EXPLANATION. KEEP UP THE GOOD WORK

  • @santypanda4903
    @santypanda4903 Месяц назад

    Is this the full video? Where is the link? I thought it got abruptly cut at the end.

  • @cuccuckute7758
    @cuccuckute7758 Месяц назад

    waiting for 70 days...

  • @Someonner
    @Someonner Месяц назад

    AMEX also asks the same question

  • @ashwinraje6520
    @ashwinraje6520 Месяц назад

    Just completed this project after a lot of debugging. Got to learn about factory design pattern. Is this pattern typically used in the production environments? Thank you Ankur for creating such a quality project!

    • @TheBigDataShow
      @TheBigDataShow Месяц назад

      Yes a lot. Try learning builder, singleton and companion, low level design now.

  • @gagansingh3481
    @gagansingh3481 Месяц назад

    Where do we learn pyspark from scratch to advance with databricks

  • @anshusharaf2019
    @anshusharaf2019 Месяц назад

    Hey Ankur This side Anshu, First of all, thanks for your amazing effort I'm a little bit confused about the source file (Extraction part) You explained to us in the videos We have used sources like CSV, Parquet, and Delta Table. But this is the type of file where you keep the data as a source then what is the Actual Source of data? For example, we have some ABC database I export the data in CSV or parquet and other file formats But my data source would be ABC Data Base) is it the right way I think? @Ankur

  • @MohitKumar-ex1pk
    @MohitKumar-ex1pk Месяц назад

    do really Interviewer give this much leverage, In most of the Interviews i gave, I was not even allowed to use any IDE, I have to write the code in Notepad. I'll suggest all the budding Data Engineers to practice and remember at least the basic syntax. window, to_date(), groupby() are very common and used extensively. \

    • @TheBigDataShow
      @TheBigDataShow Месяц назад

      Not always, but in some of the interviews they are allowing it. Our aim is to demonstrate more Interview related problems so that it can help interviewees in their preparations.

    • @MohitKumar-ex1pk
      @MohitKumar-ex1pk Месяц назад

      ​@@TheBigDataShow No doubt in that, being a experienced DE, I find these mock test questions very relatable, you guys are doing a great job :-)

  • @debabratabar2008
    @debabratabar2008 Месяц назад

    is below correct ? df_count = example_df.count() ----> transformation example_df.count() ---> job ?